Turkish Electricity Consumption is really important matter to work on due to supply’s dependence on consumption demand. Therefore careful and meticilous modelling and accurate estimations are required, and this project will present a way to perform such modelling and its results.

Data Import & Manipulation

Libraries

library(forecast)
library(tseries)
library(ggplot2)
library(zoo)
library(xts)
library(dplyr)
library(tidyr)
library(MLmetrics)
library(urca)
library(astsa)
library(readr)
library(data.table)
library(readr)

Consumption Data

Data is obtained from EPIAS - EXIST Transparency Platform. Following chunk imports and manipulates a little.

I will transform the data into daily series since it will be better to estimate daily values values as it was mentioned in the lectures. Can be extended to hourly easily.

# consumption indices and data
getwd()
bulk_consumption_with_temp <- read.csv("IEdata.csv")
names(bulk_consumption_with_temp) <- c("Date", "Hour", "Consumption")
bulk_consumption_with_temp <- as.data.table(bulk_consumption_with_temp)
bulk_consumption =bulk_consumption_with_temp %>% 
  group_by(Date) %>% 
  summarise(Consumption = sum(Consumption)
  )
bulk_consumption$Date <- as.Date(bulk_consumption$Date)
head(bulk_consumption)

Feature dataset is obtained from our group’s project feature dataset. It includes some thought features that may impact the consumption values. Appropriate length of data is given by 1484 value since the dataset is a little longer.

# feature indices and data
all_features_df <- read.csv("all_features_with_dummies.csv")
# head(all_features_df)
# tail(all_features_df)
dt_daily <- head(all_features_df, 1484)
# tail(dt_daily)
dt_daily <- as.data.table(dt_daily)
tail(dt_daily)
##          date covid_severity sunlight_time_minutes production_capacity_rate
## 1: 2021-01-18       49.48276                   579                     75.5
## 2: 2021-01-19       70.68966                   580                     75.5
## 3: 2021-01-20       51.60345                   582                     75.5
## 4: 2021-01-21       40.29310                   584                     75.5
## 5: 2021-01-22       45.09550                   586                     75.5
## 6: 2021-01-23       43.87804                   588                     75.5
##    price_of_electricity laborforce   export monday tuesday wednesday thursday
## 1:                551.5   50.90053 14486.97      1       0         0        0
## 2:                551.5   50.90053 14486.97      0       1         0        0
## 3:                551.5   50.90053 14486.97      0       0         1        0
## 4:                551.5   50.90053 14486.97      0       0         0        1
## 5:                551.5   50.90053 14486.97      0       0         0        0
## 6:                551.5   50.90053 14486.97      0       0         0        0
##    friday saturday sunday new_year nat_holiday sacrifice_holiday sacrifice_eve
## 1:      0        0      0        0           0                 0             0
## 2:      0        0      0        0           0                 0             0
## 3:      0        0      0        0           0                 0             0
## 4:      0        0      0        0           0                 0             0
## 5:      1        0      0        0           0                 0             0
## 6:      0        1      0        0           0                 0             0
##    ramadan_holiday ramadan_eve monday_or_friday_between_holidays extra_holidays
## 1:               0           0                                 0              0
## 2:               0           0                                 0              0
## 3:               0           0                                 0              0
## 4:               0           0                                 0              0
## 5:               0           0                                 0              0
## 6:               0           0                                 0              0
##    partial_lockdown full_lockdown
## 1:                1             0
## 2:                1             0
## 3:                1             0
## 4:                1             0
## 5:                1             0
## 6:                0             1

Dataset Division - Test & Train

The manipualtion, test and train dataset division is performed next. Train data is as specified in the data 2021-01-09. Test data is between 2021-01-10 and 2021-01-23, as it is available and will be used to measure performance only.

# test train prediction determination
test_dt_daily<- dt_daily[1471:(nrow(dt_daily)),]
dt_daily <- dt_daily[1:1470,]
guess_ahead <- nrow(test_dt_daily)

train_consumption_daily <- bulk_consumption$Consumption[1:1470]
test_consumption_daily <- tail(bulk_consumption$Consumption, (length(bulk_consumption$Consumption)-nrow(dt_daily)))

Stationarity

Time-series Properties

Time-series Conversion

Next, we will convert the dataset into a time-series dataset with weekly frequency as it is mentioned in the lectures, electricity consumption follows a weekly seasonal pattern. Therefore, frequency is set to 7.

Decomposition, residual analysis that defines stationarity property of the data (residuals) through unit-root test is also performed here.

Next, autocorrelation and partial auto-correlation graphs are given.

# time series conversion
ts_train_consumption_daily <- ts(train_consumption_daily, frequency = 7)
ts_test_consumption_daily <- ts(test_consumption_daily, frequency = 7)

decomposed_ts_train_consumption_daily <- decompose(ts_train_consumption_daily)
plot(decomposed_ts_train_consumption_daily)

decomp_ts_random <- decomposed_ts_train_consumption_daily$random
summary(ur.kpss(decomp_ts_random))
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 0.0028 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Seasonality pattern of weekly pattern is clear as it can be deduced from seasonal chart of the output. The random part that is residuals seems to have somewhat stationary property, but some outliers are required to be corrected.

Auto-Correlations

par(mfrow = c(3,1))
plot(decomp_ts_random)
acf(decomp_ts_random, na.action = na.pass)
pacf(decomp_ts_random, na.action = na.pass)

par(mfrow = c(1,1))

Auto-correlationns and partial-autocorrelations are as seen. Some values imply positive correlations, and we will first try logging to see whether there is a non-constant variance in the data. Afterwards, we will try weekly differencing as the plots imply lag7 autocorrelations a little.

Log-values

# logging the data
logged_ts_train_consumption_daily <- log(ts_train_consumption_daily)
logged_ts_test_consumption_daily <- log(ts_test_consumption_daily)

decomposed_logged_ts_train_consumption_daily <- decompose(logged_ts_train_consumption_daily)
plot(logged_ts_train_consumption_daily)

decomp_logged_ts_random <- decomposed_logged_ts_train_consumption_daily$random
summary(ur.kpss(decomp_logged_ts_random))
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 0.0028 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
par(mfrow = c(3,1))
plot(decomp_logged_ts_random)
acf(decomp_logged_ts_random, na.action = na.pass)
pacf(decomp_logged_ts_random, na.action = na.pass)

par(mfrow = c(1,1))

Since the logged values do not improve the residuals and stationarity, logging is deemed to be ineffective.

Differencing

# differencing with lag7
differenced_logged_ts_train_consumption_daily <- ts(diff(ts_train_consumption_daily, 7), frequency = 7)
decomposed_differenced_logged_ts_train_consumption_daily <- decompose(differenced_logged_ts_train_consumption_daily)
decomp_diffed_logged_ts_random <- decomposed_differenced_logged_ts_train_consumption_daily$random
summary(ur.kpss(decomp_diffed_logged_ts_random))
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 0.004 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Differencing seems to be not effective as the unit-root test gives higher p-value. Therefore differencing will not be used in the optimal model.

Therefore we will use base values and correct them with outlier operations.

Outlier Optimization

First, we will get the indices of error outliers.

tsoutliers(decomp_ts_random)$index
##  [1]  121  139  174  177  178  180  241  243  244  245  247  248  366  478  486
## [16]  528  529  531  532  595  596  598  599  600  603  604  607  667  731  843
## [31]  851  882  883  885  886  887  890  926  951  954  955  956  957  972 1032
## [46] 1093 1096 1197 1209 1217 1218 1235 1237 1238 1241 1242 1244 1292 1305 1306
## [61] 1307 1308 1309 1311 1398 1462 1463

New Year Outliers

As the inspection shows, new year days are always outliers. First, we will get the indices of data to correct those values.

outlier_indexes <- tsoutliers(decomp_ts_random)$index
outlier_values <- tsoutliers(decomp_ts_random)$replacements
newyear_indices <- c(1,outlier_indexes[c(15,33,51,76)])
newyear_indices
## [1]    1  486  883 1218   NA

Results will be a little bit better with replacements.

Other Outliers

Getting rid of all these outliers and smoothing them via tsoutliers predictions:

ts_train_consumption_daily[outlier_indexes] <- outlier_values
summary(ur.kpss(decompose(ts_train_consumption_daily)$random))
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 0.0027 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
plot(decompose(ts_train_consumption_daily)$random)

Again, just a bit better stationarity.

Remaining Outliers

Next remaining outliers will be pointed out from data outliers, not error outliers. Then, they will be replaced again with the tsoutliers function estimated replacement values.

still_outliers <- tsoutliers(ts_train_consumption_daily)
ts_train_consumption_daily[still_outliers$index] <- still_outliers$replacements

summary(ur.kpss(decompose(ts_train_consumption_daily)$random))
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 0.0029 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
plot(decompose(ts_train_consumption_daily)$random)

Forecasting

Performance Function

# performance metrics function

perf_dt=function(actual,forecast){
  n=length(actual)
  error=actual-forecast
  mean=mean(actual)
  sd=sd(actual)
  FBias=sum(error)/sum(actual)
  MPE=sum(error/actual)/n
  MAPE=sum(abs(error/actual))/n
  RMSE=sqrt(sum(error^2))/n
  MAD=sum(abs(error))/n
  WMAPE=MAD/mean
  l=data.frame(n,mean,sd,FBias,MAPE,RMSE,MAD,WMAPE)
  return(l)
}

Feature Data Manipulation fot Train & Test

xregr <- cbind(dt_daily$covid_severity,
               dt_daily$sunlight_time_minutes,
               dt_daily$production_capacity_rate,
               dt_daily$price_of_electricity,
               dt_daily$new_year,
               dt_daily$nat_holiday,
               dt_daily$sacrifice_holiday,
               dt_daily$sacrifice_eve,
               dt_daily$ramadan_holiday,
               dt_daily$ramadan_eve,
               dt_daily$monday_or_friday_between_holidays,
               dt_daily$extra_holidays,
               dt_daily$full_lockdown)


newxregr <- cbind(test_dt_daily$covid_severity,
                  test_dt_daily$sunlight_time_minutes,
                  test_dt_daily$production_capacity_rate,
                  test_dt_daily$price_of_electricity,
                  test_dt_daily$new_year,
                  test_dt_daily$nat_holiday,
                  test_dt_daily$sacrifice_holiday,
                  test_dt_daily$sacrifice_eve,
                  test_dt_daily$ramadan_holiday,
                  test_dt_daily$ramadan_eve,
                  test_dt_daily$monday_or_friday_between_holidays,
                  test_dt_daily$extra_holidays,
                  test_dt_daily$full_lockdown)

AR(1) Model

mmmf <- sarima.for(ts_train_consumption_daily, 1,0,0,0,0,0,7, n.ahead = guess_ahead, xreg = xregr,  newxreg = newxregr)

perf_dt(actual = test_consumption_daily, forecast = mmmf$pred)
##    n     mean       sd      FBias       MAPE     RMSE      MAD      WMAPE
## 1 14 890563.3 60458.13 0.06860152 0.07487503 20338.75 67918.91 0.07626511

MA(1) Model

mmmf <- sarima.for(ts_train_consumption_daily, 0,0,0,1,0,0,7, n.ahead = guess_ahead, xreg = xregr,  newxreg = newxregr)

perf_dt(actual = test_consumption_daily, forecast = mmmf$pred)
##    n     mean       sd      FBias       MAPE    RMSE      MAD      WMAPE
## 1 14 890563.3 60458.13 0.05072508 0.04995179 13896.3 45173.89 0.05072508

Since these model are manuel models, we will utilize auto.arima model to find better values for parameters

auto.arima

# auto.arima parameter finding

auto.arima(ts_train_consumption_daily, xreg = xregr, seasonal = T, trace = T)
## 
##  Fitting models using approximations to speed things up...
## 
##  Regression with ARIMA(2,0,2)(1,1,1)[7] errors : Inf
##  Regression with ARIMA(0,0,0)(0,1,0)[7] errors : 34432.15
##  Regression with ARIMA(1,0,0)(1,1,0)[7] errors : 32365.27
##  Regression with ARIMA(0,0,1)(0,1,1)[7] errors : 33453.6
##  ARIMA(0,0,0)(0,1,0)[7]                    : 34430.27
##  Regression with ARIMA(1,0,0)(0,1,0)[7] errors : 32780.48
##  Regression with ARIMA(1,0,0)(2,1,0)[7] errors : 32219.76
##  Regression with ARIMA(1,0,0)(2,1,1)[7] errors : Inf
##  Regression with ARIMA(1,0,0)(1,1,1)[7] errors : Inf
##  Regression with ARIMA(0,0,0)(2,1,0)[7] errors : 34425.36
##  Regression with ARIMA(2,0,0)(2,1,0)[7] errors : 32120.81
##  Regression with ARIMA(2,0,0)(1,1,0)[7] errors : 32265.15
##  Regression with ARIMA(2,0,0)(2,1,1)[7] errors : Inf
##  Regression with ARIMA(2,0,0)(1,1,1)[7] errors : Inf
##  Regression with ARIMA(3,0,0)(2,1,0)[7] errors : 32123.28
##  Regression with ARIMA(2,0,1)(2,1,0)[7] errors : 32122.44
##  Regression with ARIMA(1,0,1)(2,1,0)[7] errors : 32134.28
##  Regression with ARIMA(3,0,1)(2,1,0)[7] errors : 32125.31
##  ARIMA(2,0,0)(2,1,0)[7]                    : 32118.77
##  ARIMA(2,0,0)(1,1,0)[7]                    : 32263.11
##  ARIMA(2,0,0)(2,1,1)[7]                    : Inf
##  ARIMA(2,0,0)(1,1,1)[7]                    : Inf
##  ARIMA(1,0,0)(2,1,0)[7]                    : 32217.72
##  ARIMA(3,0,0)(2,1,0)[7]                    : 32121.24
##  ARIMA(2,0,1)(2,1,0)[7]                    : 32120.4
##  ARIMA(1,0,1)(2,1,0)[7]                    : 32132.23
##  ARIMA(3,0,1)(2,1,0)[7]                    : 32123.27
## 
##  Now re-fitting the best model(s) without approximations...
## 
##  ARIMA(2,0,0)(2,1,0)[7]                    : 32271.43
## 
##  Best model: Regression with ARIMA(2,0,0)(2,1,0)[7] errors
## Series: ts_train_consumption_daily 
## Regression with ARIMA(2,0,0)(2,1,0)[7] errors 
## 
## Coefficients:
##          ar1      ar2     sar1     sar2     xreg1     xreg2     xreg3     xreg4
##       1.2126  -0.2956  -0.6608  -0.3182  112.2152  -45.5971  608.0319  -83.3959
## s.e.  0.0280   0.0282   0.0258   0.0254  130.4411  152.9512  808.4956  150.3839
##          xreg5      xreg6      xreg7       xreg8       xreg9      xreg10
##        948.939  -7626.194  -16866.05  -17913.766  -34941.865  -23690.760
## s.e.  4103.752   1750.308    6009.37    5290.854    5732.501    5197.141
##           xreg11     xreg12     xreg13
##       -28329.335  -4109.337  -8278.056
## s.e.    2883.339   2169.225   3451.554
## 
## sigma^2 estimated as 218805653:  log likelihood=-16117.48
## AIC=32270.95   AICc=32271.43   BIC=32366.14

Found parameters will be used in seasonal arima.

# sarima model

mmm <- sarima(ts_train_consumption_daily, 2,0,0,2,1,0,7, xreg = xregr)
## initial  value 10.392477 
## iter   2 value 10.208316
## iter   3 value 9.896441
## iter   4 value 9.806952
## iter   5 value 9.768203
## iter   6 value 9.744714
## iter   7 value 9.711160
## iter   8 value 9.675620
## iter   9 value 9.646956
## iter  10 value 9.617706
## iter  11 value 9.607328
## iter  12 value 9.601161
## iter  13 value 9.599885
## iter  14 value 9.598440
## iter  15 value 9.597334
## iter  16 value 9.596791
## iter  17 value 9.596054
## iter  18 value 9.595859
## iter  19 value 9.595760
## iter  20 value 9.595696
## iter  21 value 9.595623
## iter  22 value 9.595545
## iter  23 value 9.595454
## iter  24 value 9.595376
## iter  25 value 9.595336
## iter  26 value 9.595326
## iter  27 value 9.595324
## iter  28 value 9.595324
## iter  29 value 9.595324
## iter  30 value 9.595323
## iter  31 value 9.595323
## iter  31 value 9.595323
## iter  31 value 9.595323
## final  value 9.595323 
## converged
## initial  value 9.597938 
## iter   2 value 9.597831
## iter   3 value 9.597798
## iter   4 value 9.597796
## iter   5 value 9.597794
## iter   6 value 9.597793
## iter   7 value 9.597793
## iter   8 value 9.597792
## iter   9 value 9.597792
## iter   9 value 9.597792
## iter   9 value 9.597792
## final  value 9.597792 
## converged

mmm$ttable
##           Estimate        SE  t.value p.value
## ar1         1.2126    0.0280  43.2345  0.0000
## ar2        -0.2956    0.0282 -10.4747  0.0000
## sar1       -0.6608    0.0258 -25.5750  0.0000
## sar2       -0.3182    0.0254 -12.5093  0.0000
## xreg1     112.2152  130.4411   0.8603  0.3898
## xreg2     -45.5971  152.9512  -0.2981  0.7657
## xreg3     608.0319  808.4956   0.7521  0.4521
## xreg4     -83.3959  150.3839  -0.5546  0.5793
## xreg5     948.9390 4103.7520   0.2312  0.8172
## xreg6   -7626.1936 1750.3083  -4.3571  0.0000
## xreg7  -16866.0467 6009.3697  -2.8066  0.0051
## xreg8  -17913.7655 5290.8543  -3.3858  0.0007
## xreg9  -34941.8652 5732.5012  -6.0954  0.0000
## xreg10 -23690.7600 5197.1412  -4.5584  0.0000
## xreg11 -28329.3354 2883.3387  -9.8252  0.0000
## xreg12  -4109.3372 2169.2249  -1.8944  0.0584
## xreg13  -8278.0559 3451.5541  -2.3984  0.0166

Additional regressors are mostly significant that is good.

SARIMA Forecasting

# sarima forecasting

mmmf <- sarima.for(ts_train_consumption_daily, 2,0,0,2,1,0,7, n.ahead = guess_ahead, xreg = xregr, newxreg = newxregr)

# plot for forecast and actual
par(mfrow= c(2,1))
plot(mmmf$pred)
plot(ts_test_consumption_daily, type = "l")

par(mfrow= c(1,1))

Performance of SARIMA model

# performance metrics

perf_dt(actual = test_consumption_daily, forecast = mmmf$pred)
##    n     mean       sd      FBias      MAPE     RMSE      MAD      WMAPE
## 1 14 890563.3 60458.13 0.03548704 0.0393802 11350.44 35937.37 0.04035353

Next, we will try moving window approach to see if better results can be reached. Again, auto.arima parameters will be used.

SARIMA Moving Window Forecast

# moving_window approach



sliding_is_fun <- function(regressed = ts_train_consumption_daily, train_regressor = xregr, test_regressor = newxregr, sliding_window = 1)
{
  slicet <- nrow(test_regressor)
  holder <- nrow(regressed)
  pred <- c()
  # tester_regressor <- matrix(rep(1, 13), nrow = 1)
  for(i in 1:slicet)
  {
    tester_regressor <- matrix(c(test_regressor[i,]), nrow = 1)
    guessor <- sarima.for(regressed, 2,0,0,2,1,0,7, n.ahead = sliding_window, xreg = train_regressor, newxreg = tester_regressor)
    regressed <- c(regressed, guessor$pred)
    pred <- c(pred, guessor$pred)
    train_regressor <- rbind(train_regressor, tester_regressor)
  }
  return(pred)
}

moving_forecasted_values <- sliding_is_fun()

Forecast

moving_forecasted_values
##  [1] 758828.0 883726.8 899110.7 905243.5 887979.6 877708.8 826391.5 750780.0
##  [9] 875859.2 891912.6 898215.7 875773.8 870378.7 823517.1

Performance

perf_dt(actual = test_consumption_daily, forecast = moving_forecasted_values)
##    n     mean       sd      FBias       MAPE     RMSE      MAD     WMAPE
## 1 14 890563.3 60458.13 0.03548795 0.03938105 11350.71 35938.15 0.0403544

Results

Results show that we have a good model for estimating 14 days with around 0.04 WMAPE. It shows that our models are good and estimations can be used as proxy supply values that can guide suppliers.